15 research outputs found
Discovery and recognition of motion primitives in human activities
We present a novel framework for the automatic discovery and recognition of
motion primitives in videos of human activities. Given the 3D pose of a human
in a video, human motion primitives are discovered by optimizing the `motion
flux', a quantity which captures the motion variation of a group of skeletal
joints. A normalization of the primitives is proposed in order to make them
invariant with respect to a subject anatomical variations and data sampling
rate. The discovered primitives are unknown and unlabeled and are
unsupervisedly collected into classes via a hierarchical non-parametric Bayes
mixture model. Once classes are determined and labeled they are further
analyzed for establishing models for recognizing discovered primitives. Each
primitive model is defined by a set of learned parameters.
Given new video data and given the estimated pose of the subject appearing on
the video, the motion is segmented into primitives, which are recognized with a
probability given according to the parameters of the learned models.
Using our framework we build a publicly available dataset of human motion
primitives, using sequences taken from well-known motion capture datasets. We
expect that our framework, by providing an objective way for discovering and
categorizing human motion, will be a useful tool in numerous research fields
including video analysis, human inspired motion generation, learning by
demonstration, intuitive human-robot interaction, and human behavior analysis
Vision-based deep execution monitoring
Execution monitor of high-level robot actions can be effectively improved by
visual monitoring the state of the world in terms of preconditions and
postconditions that hold before and after the execution of an action.
Furthermore a policy for searching where to look at, either for verifying the
relations that specify the pre and postconditions or to refocus in case of a
failure, can tremendously improve the robot execution in an uncharted
environment. It is now possible to strongly rely on visual perception in order
to make the assumption that the environment is observable, by the amazing
results of deep learning. In this work we present visual execution monitoring
for a robot executing tasks in an uncharted Lab environment. The execution
monitor interacts with the environment via a visual stream that uses two DCNN
for recognizing the objects the robot has to deal with and manipulate, and a
non-parametric Bayes estimation to discover the relations out of the DCNN
features. To recover from lack of focus and failures due to missed objects we
resort to visual search policies via deep reinforcement learning
Rigid tool affordance matching points of regard
In this abstract we briefly introduce the analysis of simple rigid object affordance by experimentally establishing the relation between the point of regard of subjects before grasping an object and the finger tip points of contact once the object is grasped. The analysis show that there is a strong relation between these data, in so justifying the hypothesis that people figures out how objects are afforded according to their functionality
Bayesian non-parametric inference for manifold based MoCap representation
We propose a novel approach to human action recognition, with motion capture data (MoCap), based on grouping sub-body parts. By representing configurations of actions as manifolds, joint positions are mapped on a subspace via principal geodesic analysis. The reduced space is still highly informative and allows for classification based
on a non-parametric Bayesian approach, generating behaviors for each sub-body part. Having partitioned the set of joints, poses relative to a sub-body part are exchangeable,
given a specified prior and can elicit, in principle, infinite behaviors. The generation of these behaviors is specified by a Dirichlet process mixture. We show with several experiments
that the recognition gives very promising results, outperforming methods requiring temporal alignment
Component-wise modeling of articulated objects
We introduce a novel framework for modeling articulated objects based on the aspects of their components. By decomposing the object into components, we divide the problem in smaller modeling tasks. After obtaining 3D models for each component aspect by employing a shape deformation paradigm, we merge them together, forming the object components. The final model is obtained by assembling the components using an optimization scheme which fits the respective 3D models to the corresponding apparent contours in a reference pose. The results suggest that our approach can produce realistic 3D models of articulated objects in reasonable time
Visual search and recognition for robot task execution and monitoring
Visual search of relevant targets in the environment is a crucial robot
skill. We propose a preliminary framework for the execution monitor of a robot
task, taking care of the robot attitude to visually searching the environment
for targets involved in the task. Visual search is also relevant to recover
from a failure. The framework exploits deep reinforcement learning to acquire a
"common sense" scene structure and it takes advantage of a deep convolutional
network to detect objects and relevant relations holding between them. The
framework builds on these methods to introduce a vision-based execution
monitoring, which uses classical planning as a backbone for task execution.
Experiments show that with the proposed vision-based execution monitor the
robot can complete simple tasks and can recover from failures in autonomy
Towards an understanding of human activities: from the skeleton to the space
In this thesis is described the reasearch undertaken for the Ph.D. project in Computer Vision, having the main objective to tackle human activity recognition from RGB videos.
Human activity recognition from videos aims to recognize which human activities are taking place during a video, considering only cues directly extracted from video frames.
The related applications are manifold: healthcare monitoring applications, such as rehabilitation or stress monitoring, monitoring and surveillance for indoor and outdoor activities, human-machine interaction, entertainment etc..
An important disambiguation has to be exposed before proceeding further: the one between action and activity.
Actions are generally described in literature as single person movements that may be composed of multiple simple gestures organized temporally, such as walking, waving or and punching. Gestures are instead elementary movements of a body part. On the other hand, activities are described as involving two or more persons and/or objects, or a single person performing complex actions, i.e. a sequence of actions.
Human activity recognition is one of the main subjects of study of computer vision and machine learning communities since a long time, and it is still an hot topic due to its complexity.
A challenging task is to develop a system for human activity recognition, due to well-known computer vision problems. Body parts occlusions, light conditions, and image resolution are only a subset of this problems. Furthermore, similitudes between activity classes make the problem even harder. Activities in the same class may be exhibited by distinct persons with distinct human body movements, and activities in different classes may be hard to discriminate because they may be constituted by analogous information. The way in which humans execute an activity depends on their habits, and this drives the challenge of detecting activities quite difficult.
The main consideration coming out deeply analyzing the available literature for activity recognition, is that an activity recognition robust system has to be context-aware. Namely, not only the human motion is important to achieve good performances, but also other relevant cues which can be extracted from videos have to be considered.
The available state of the art research in computer vision still misses a complete framework for human activity recognition based on context, taking into account both the scene where activities are taking place, objects analysis, 3D human motion analysis and interdependence between activity classes.
This thesis describes computer vision frameworks which will enable the robust recognition of human activities explicitly considering the scene context.
In this thesis are described the main contributions for context-aware activity recognition regarding 3D modeling of articulated and complex objects, 3D human pose estimation from single images and a method for activity recognition based on human motion primitives. Four major publications will be presented, together with an extensive literature review concerning computer vision areas such as 3D object modeling, 3D human pose estimation, human action recognition, human action recognition based on action and motion primitives and human activity recognition based on context.
Future work concerning the undertaken research will be to build a complete system for activity recognition based on context, exploiting the several frameworks introduced so far
Single image object modeling based on BRDF and r-surfaces learning
A methodology for 3D surface modeling from a single image is proposed. The principal novelty is concave and specular surface modeling without any externally imposed prior. The main idea of the method is to use BRDFs and generated rendered surfaces, to transfer the normal field, computed for the generated samples, to the unknown surface. The transferred information is adequate to blow and sculpt the segmented image mask in to a bas-relief of the object. The object surface is further refined basing on a photo-consistency formulation that relates for error minimization the original image and the modeled object
Human motion primitive discovery and recognition
We present a novel framework for the automatic discovery and recognition of
human motion primitives from motion capture data. Human motion primitives are
discovered by optimizing the 'motion flux', a quantity which depends on the
motion of a group of skeletal joints. Models of each primitive category are
computed via non-parametric Bayes methods and recognition is performed based on
their geometric properties. A normalization of the primitives is proposed in
order to make them invariant with respect to anatomical variations and data
sampling rate. Using our framework we build a publicly available dataset of
human motion primitives based on motion capture sequences taken from well-known
datasets. We expect that our framework, by providing an objective way for
discovering and categorizing human motion, will be a useful tool in numerous
research fields related to Robotics including human inspired motion generation,
learning by demonstration, and intuitive human-robot interaction